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ABSTRACT. In this paper, we present a novel system, inVideo, for video data analytics, and its use 
in transforming linear videos into interactive learning objects. InVideo is able to analyze video 
content automatically without the need for initial viewing by a human. Using a highly efficient 
video indexing engine we developed, the system is able to analyze both language and video 
frames. The time-stamped commenting and tagging features make it an effective tool for 
increasing interactions between students and online learning systems. Our research shows that 
inVideo presents an efficient tool for learning technology research and increasing interactions in 
an online learning environment. Data from a cybersecurity program at the University of 
Maryland show that using inVideo as an adaptive assessment tool, interactions between 
student-student and student-faculty in online classrooms increased significantly across 24 
sections program-wide. 
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1 INTRODUCTION 

Learning technology practitioners and researchers face the challenges of analyzing enormous amounts 
of digital data in the new cultures of learning that are emerging, especially videos on Massive Open 
Online Courses (MOOCs). Attempting to address the challenge of initial analysis and selection of digital 
video data can provide significant benefit for learning and research in the field. 

Big data analytics are used to collect, curate, search, analyze, and visualize large data sets generated 
from sources such as texts (including blogs and chats), images, videos, logs, and sensors (Bakshi, 2012). 
Video data is a major format of unstructured data, and should be an indispensable area of big data 
analytics. However, most analytics tools are only able to use structured data. Due to the nature of the 
special file format, traditional search engines cannot penetrate into videos, and therefore video indexing 
becomes a problem. 

Videos contain both audio and visual components, and neither of these components is text based. To 
understand a video, viewers must actually play it and use their eyes and ears to analyze the sounds and 
visuals being presented to them. Without watching a video, it is hard to glean information from its 
content or even know whether there is information to be found within. Existing search engines and data 
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analytics tools such as Google, SAS, SPSS, and Hadoop are effective only in analyzing text and image 
data. Video data, however, are difficult to index and therefore difficult to analyze. 

In education, video presents a large opportunity for both classroom and online education (Rideout, 
Foehr, & Roberts, 2010). In addition, video is a great teaching format because it can both be more 
enjoyable and more memorable than other instruction formats (Choi & Johnson, 2005). Furthermore, 
video instruction allows for students to work at their own pace, for teachers to be able to teach more 
students, and for more reusable teaching materials to be available when compared to an in-person 
lecture. MOOC creators realize the many benefits of video, as evidenced by the prevalence of video in 
MOOCs. Many MOOCs focus on video files for the bulk of their instructional material so it is clear that 
the MOOCs of the future must also focus on videos. 

Structuring video data and accurately modelling relevant metrics has value for future courseware 
applications, especially MOOCs. A MOOC can be enhanced or limited by the design of the user interface, 
as well as the analytics by which the MOOC is assessed. InVideo provides rich data on videos, turns them 
into more effective and interactive learning objects, and improves MOOCs. One teacher cannot possibly 
personally oversee the development of thousands of students with any effectiveness, and therefore, 
effectively leveraging technology to develop effective analytics is essential to the effectiveness of 
MOOCs. Yet, the problem of education at such a large scale is a relatively novel problem. The wide use 
of applications of inVideo could potentially address this novel and growing problem. 

InVideo (Wang & Behrmann, 2010), developed under a US Department of Education grant, is able to 
analyze video content (language and video frames) prior to initial close researcher review of the video. A 
highly efficient video indexing engine can analyze both language and video frames based on natural 
language and referent objects. Once a video is indexed, its content becomes searchable and statistical 
analysis as well as qualitative analysis are possible. Commenting and tagging add a layer of interaction 
between students and online courses. They also increase the accuracy of the transcript, which was 
automatically extracted from the video by the inVideo tool. The indexing technology is especially useful 
in mining video data for learning events. InVideo can promote initial data selection, analyzing video data 
in two different ways: one is to find keywords that were spoken in the video; another way is to identify 
an object in the video if a reference picture is provided. inVideo also has an automatic caption system 
that can transcribe the words spoken in the video. Instructors can use the tool to construct in-place 
video quizzes for assessments. 

Learning is an integration of interaction. The interaction might exist between learners and instructors or 
between learners and computers. While the traditional approach would be to analyze grades at the end 
of the semester, this lacks the benefits that come from interactions that occur during the course (Elias, 
2011). As an increasingly large number of educational resources move online, analyzing interactions 
between students and online course material is becoming more important. Many learning management 
systems (LMSs) have built-in learning analytics tools to look into the data. Due to the limitation of the 
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data gathering and indexing, the built-in tools are generally not sufficient for assessing study outcomes, 
especially for video content. 

2 RELATED WORK 

Angel, Javier, Pablo, and Baltasar (2012) proposed a way to track student interactivity by logging their 
interactions with educational video games. Their conclusion was that even simple semi-automatic 
tracking, which uses both computer and human, shows advantage compared to most video-related 
systems that are fed lower quality data. 

Big data and learning analytics can become part of the solutions integrated into administrative and 
instructional functions of higher education (Picciano, 2012). Traditional face-to-face instruction supports 
traditional data-driven decision-making process. Videos as a form of big data are more extensive and 
especially time-sensitive learning analytics applications. It is important that instructional transactions 
are collected as they occur. 

Learning analytics can provide powerful tools to support teachers in the iterative process of improving 
the effectiveness of their course and to collaterally enhance their students' performance (Dyckhoff, 
Zielke, Bultmann, Chatti, & Schroeder, 2012). Dyckhoff developed a toolkit to enable teachers to explore 
and correlate learning object usage, user behaviour, as well as assessment results based on graphical 
indicators. This learning analytics system can analyze data such as time spent, areas of interest, usage of 
resources, participation rates and correlation with grades data and visualize them using a dashboard. 
However, the system is unable analyze the interactions between students and the online learning 
systems on videos. 

Haubold and Kender (2004) introduced techniques for extracting, analyzing, and visualizing textual 
contents from instructional videos. They obtained transcripts from videos of university courses. Using 
the information, they indexed the transcripts and displays in graphs that help in understanding the 
overall course structure. Unfortunately, the index process only takes static data, no interactive data are 
included. 

In order to improve interactions between students and online course material, especially videos, we 
have developed a video index engine to look at every word spoken in the video and categorize it using 
our custom index algorithm. In addition, a content-based pattern recognition engine can search 
individual frames of the video to recognize objects and individuals being displayed. The collaborative 
commenting, tagging, and in-place quizzes make videos more accessible and also increase the accuracy 
of the search engine. 

3 VIDEO INDEXING AND SEARCH 

Videos are a different data type than text and images, in that they are unstructured data. Traditional 
search engines are mostly text based, with a few tools that allow for searching of images. In order to 
index a video, a search engine needs to extract meaningful language from the audio and convert it to 
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text, while simultaneously converting the visual frames into a series of images that can be used to 
recognize persons and objects in the video. This is an extremely difficult task, given that videos are a 
compound format. Not only are the audio and visual components integrated, but also within each of 
these components there is a blend of information being presented in a manner that cannot be 
distinguished as easily by a computer as by a human brain. For example, the audio of the file may 
contain speech, music, and background noises that a computer will have a hard time recognizing and 
analyzing. 

3.1 Automatic Indexing Algorithm 

The video indexing engine uses the vector space model to represent the document by a set of possible 
weighted content terms. The weight of the term reflects its importance in relation to the meaning of the 
document (Wang, Chen, & Behrmann, 2004). After calculating the normalized frequency of a term in the 
document, the weight to measure the relative importance of each concept or single term is obtained. 
The automatic index algorithm then calculates the final position in n-dimensional space. The result is to 
be used for generating search results or visualization. 


3.2 Searching Videos by Keywords 


Video search involves two steps: analyzing by keywords and analyzing by image references. When a 
keyword is entered, the system looks through the indexed audio transcript to see if there is a match. An 
image reference may refer to either a picture or keywords that describe an object in the video using an 
appropriate semantic space. Video clips whose language contains the keywords will be retrieved. Figure 
1 shows how indexed videos can be searched using keywords in the spoken language. 


VideoSearch 


A 


student 


Wavs to Protect the Earth 
student 00:00:09 



teachers may discuss signs of the holiday section on the floor of a 
new student often learn about them anything person can dean up 
the road and help between orange Lutheran hour 
and plants 

in and you heard him say when 
all the TV lights 

VCR and DVD players and other appliances when we leave 


Figure 1: Analyzing videos by keywords. 

3.3 Searching Videos by References 

Searching videos by references examines the frames of the video to see if the given picture or keyword 
is found. If the reference is a picture, then the system uses a Content Based Image Retrieval (CBIR) 
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algorithm to find the match frames and return the video clips that contain the reference picture. Figure 
2 shows the image-based CBIR algorithm that retrieves the video frames corresponding to the reference 
picture (at the bottom). 



Figure 2: Analyzing videos by image references using the CBIR algorithm. 

If the reference is a keyword (e.g., "credit card") then the system uses a knowledge tree to find matches 
in the video. If one video frame contains an object matching the features associated with the keyword, 
the section of the video is returned. Figure 3 shows how a search for the keyword "credit card" will 
retrieve the video frames that contain objects as credit cards. 


VideoSearch 


A 


credit card 


Online Shopping 

credit card 00:00:09 


RHB Credit Card 

credit card 00:00:45 


5 Smart Uses for Credit Cards 

credft card 00:00:25 



II Volume: f Q Mute 


credit cards can really get you into trouble and also > 
the huge help is needed now when the poll that 
puppy out 

for example that protected a plastic would you make 
it purchased as my teammate and best amateur 
detective and even withhold payment what your your - 


Figure 3: Analyzing videos by keyword references using a knowledge tree. 
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3.4 Searching Videos with Multiple Languages 

Sometimes multiple languages may be found in videos. Transcribe engines normally only work in one 
language or in closely related languages. For other languages, a different transcribe engine may be 
required. InVideo addresses this problem by allowing videos with different languages to be searched 
from a single user interface. The inVideo system does not translate between languages. It only 
transcribes based on the language of original videos. For example, a Chinese video will result in a 
transcript in Chinese. 

Figure 4 shows the indexing engine properly analyzing the Chinese language. When entering the word 
"student" in Chinese, the video search engine will locate that term in the transcript and return the 
corresponding frames. Currently there are multiple languages that can be analyzed by the inVideo 
system, with more to be added. 



Figure 4: Analyzing videos with different languages. 

4 COLLABORATIVE AND INTERACTIVE LEARNING 

The ability to take unstructured video files and bring structure to the data embedded in them greatly 
enhances the value of any MOOC. Because MOOCs make heavy use of video as a medium for providing 
instruction, the ability to search video content to create, access, and organize the data contained within 
the videos is paramount. The information provided by instructional materials holds less value if students 
are unable to access it properly, and thus inVideo's ability to index and structure video-based instruction 
will provide great value to a MOOC's effectiveness. 

Automatically generated video transcripts may have accuracy problems. Besides, the vast numbers of 
videos in MOOCs make them impossible to be retrieved correctly with just one or a few simple 
keywords. To solve these problems, we have implemented a collaborative filtering mechanism including 
commenting, tagging, and in-place quizzes. These features improve accuracy and increase interaction 
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between students and the online learning system. With collaborative filtering, learning resource 
retrieval on MOOC systems is greatly improved, and better student achievement is therefore expected. 

4.1 Collaborative Filtering 

Collaborative filtering is a process of improving the accuracy of the automatic indexing algorithm by 
leveraging user feedback. This is popular on websites that have millions of users and user-generated 
content. Users are able to create time-stamped comments on videos. These comments can be hidden or 
made public so that someone else who views the video can see the comment at a specific time. These 
comments help increase accuracy of the search tool and transcript and enhance interactions in online 
learning. 

Tagging on videos is another implementation in the inVideo system, attaching keyword descriptions to 
identify video frames by category or topic. Videos with identical tags can then be linked together, 
allowing students to search for similar or related content. Tags can be created using words, acronyms, or 
numbers. This is also called social bookmarking. 

A search term usually yields many related results, which in many cases are hard to differentiate. 
Commenting and tagging add additional information, refine the knowledge, and increase the video 
search accuracy. Since information is growing exponentially, these features are extremely helpful for 
students to obtain the knowledge in the least amount of time. 

4.2 In-place Assessment with iQuiz 

Internet computing has the advantage of employing powerful CPUs on remote servers to provide 
applications across the network. inVideo comes with an internet computing-based video quiz system 
(iQuiz) to utilize the computational power of remote servers to provide video quiz services to users 
across the internet. 

Currently, videos are mostly non-interactive, therefore there are no interactions between students and 
the learning content. Students view videos either online or download them to their personal devices. 
There is no way for educators to know whether a student has understood the content or even to know 
whether the student has viewed the video. 

iQuiz can be used to assess learning outcomes associated with video study. Quizzes can be embedded 
into videos at any place where an instructor wants to assess the outcome of the student's study. iQuiz 
runs as a service on servers. This enables users to execute this resource-intensive application with 
personal computers or iPads, which would not be possible otherwise. 

Instructors can enter the authoring mode where they can write quizzes by indicating the start and stop 
positions on the video and adding questions. Video quizzes are stored in XML format, and are 
automatically loaded while students are watching the video in the learning mode. Answers to the 
quizzes, either correct or incorrect, are also stored in the XML database for immediate assessments. 
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Assessment of adaptive learning on videos provides better outcomes for students than the traditional 
video content study with little or no feedback (Wang & Behrmann, 2009). 

4.3 Transform Linear Videos into Interactive Learning Objects 

Video is linear in nature. It is hardly interactive nor does it contain branches. Using the inVideo tool, 
classical videos can be transformed into a series of video clips with assessments in between and at the 
end. So the video-based learning material becomes interactive. Figure 5 shows a test we conducted that 
turned a 46-minute video into six selected 2-3 minute video clips. The red segments on the stage bar 
are the samples. So it is clear that not all video content was used in the samples. 


iQuizzes 



A long video is "cropped" into 2-3 min video clips with assessments in-between 


Figure 5: Transforming linear videos into interactive learning objects. 

5 RESEARCH RESULTS 

To test whether the inVideo system improves learning, we selected the 20 most recent videos from the 
National Science Digital Library (NSDL) in cybersecurity and used the inVideo tool to extract keywords 
that appeared in the transcripts. From this set, we selected the top two ranked keywords: "Target" (data 
breach) and "encryption" (using encryption to secure data). We were confident that those two 
keywords made good discussion topics that could increase classroom interaction. 

As a result, we added two discussion topics to the spring 2014 Masters of Science in Cybersecurity 
program (24 class sections with an average of 25 students in each section). 

Videos lack interactions between learners and the online learning environment. Even worse, videos 
above a certain length will likely never be watched at all because students cannot easily determine what 
content is within it or how to locate that content. To address this issue, we used the inVideo tool to 
index the content and break the large videos into a series of small video clips. By doing so, we made it 
possible for students to watch short video clips covering individual key concepts directly, while retaining 
the ability to view the whole video if necessary. This served not only to increase student interest and 
engagement in the lesson but also more importantly to improve their ability to comprehend and retain 
information. 
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Student responses and interactions can be used as a proxy for their degree of engagement with any 
particular part of the course. As one example of how the inVideo indexing served to increase this 
measure, consider Week 2 of the class. In our assessment of past offerings (pre-inVideo) we discovered 
that this part of the course is a "quiet" week, because the individual assignment starting in the week will 
not be due until Week 8. This meant that the interactions in the classrooms dropped significantly from 
Week 1. Based on this assessment, we decided to use the inVideo intervention in an attempt to 
generate more interaction during Week 2 of the course. 



Figure 6: Number of responses for 24 sections — Week 2 discussions. 

Our initial observation of one class was very promising; the total number of responses — defined as a 
posting after viewing a video clip — for Week 2 reached 68, as compared to only 2 for the same week in 
the previous semester. This initial finding encouraged us to investigate the results for all 24 sections 
program-wide. Figure 6 shows the number of responses for the 24 sections comparing Fall 2013 to 
Spring 2014 during Week 2. For the research we conducted, Week 2 student responses across the 24 
sections were almost seven times higher during Spring 2014 (1,129 responses) than during Spring 2014 
(164 responses). 

For the cybersecurity online/hybrid class, we have five graded discussions, one individual assignment, 
one team assignment, and two lab assignments. Two more hands-on exercises (labs) have been added 
since Spring 2014. Data from the team projects, using the same intervention method, show that 
student-student and student-faculty interactions were 6.5 times greater for the courses with the 
inVideo intervention (104 responses compared to 16 responses). We also measured student 
performance against desired learning outcomes. The average grades on both team projects and final 
grades was higher in Spring 2014 than in Fall 2013. Flere we see that the index and data analytics tool 
inVideo, in combination with just-in-time assessment and intervention, improved learning outcomes. 

Based on our finding, we are in the process of breaking up every large learning module into several 
learning objects using inVideo. The new competency-based learning objects will be used to construct the 
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knowledge cloud. These new learning modules will consist of many competency-based learning objects, 
and will be more interactive, rational, and accessible. 

We will use inVideo to expand the scope of this research to other activities in courses within the 
cybersecurity program. This tool could also be useful to courses in other disciplines. Using the inVideo 
tool, linear videos are transformed into a series of interactive learning objects. This is vital in an online 
learning environment where interactions and learning outcomes are valued the most. 

6 SUMMARY 

This paper discussed a novel learning analytics tool to analyze video data in an online learning 
environment and use of the tool to analyze data generated from classrooms. Video indexing engines 
analyze both audio and visual components of a video, and the results of this analysis provide novel 
opportunities for search. Indexed videos can be further used in assessing learning outcomes to 
collaboratively comment, tag, and create video quizzes. 

Learning analytics based on indexed video can be generated in a number of ways: first by analyzing 
keywords that appear in the audio track; second, by analyzing people or objects that appear in the video 
frames; third, by analyzing these objects based on descriptive keywords; and finally by analyzing with 
different languages. This technology is especially useful when it comes to mining video data in a learning 
environment. 

To improve accuracy, we can either improve the transcribe engine, analyze video frames better where 
there is no audio, or crowd-source accuracy through collaborative filtering. For transcription accuracy, 
one potential accuracy improvement can come from using a self-learning artificial intelligence (Al) 
system that could be taught to recognize certain accents or languages. The process or requirements for 
instituting such a system and the magnitude of the improvement in accuracy are yet to be studied. 

Further research in profiling will increase accuracy. Making inVideo a web API by allowing commenting, 
tagging, and using cloud computing technology to add more user interactions will make the application 
more applicable to various users. 

At present, the inVideo tool is only limited to analyze native (non-streaming) videos. Since we are using 
many streaming videos from various sources in the courses, adding a streaming video analysis feature 
would be very helpful for the online classroom data analytics and assessment. 

The initial assessment and intervention yielded significant improvements in student interaction in the 
cybersecurity classrooms. The activities and responses in classrooms increased, student-student and 
student-faculty interactions enhanced, and the grades for team projects and exams both improved. 
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